Automatic Linefeed Insertion for Improving Readability of Lecture Transcript
نویسندگان
چکیده
The development of a captioning system that supports the real-time understanding of monologue speech such as lectures and commentaries is required. In monologues, since a sentence tends to be long, each sentence is often displayed in multi lines on the screen and becomes unreadable. In the case, it is necessary to insert linefeeds into a text so that the text becomes easy to read. This paper proposes a technique for inserting linefeeds into a Japanese spoken monologue sentence as an elemental technique to generate the readable captions. Our method inserts linefeeds into a sentence by applying the rules based on morphemes, dependencies and clause boundaries. We established the rules by circumstantially investigating the corpus annotated with linefeeds. An experiment using Japanese monologue corpus has shown the effectiveness of our rules.
منابع مشابه
Improving the Readability of Class Lecture Automatic Speech Recognition Results Using Multiple Hypotheses
This paper presents a method for improving the readability of class lecture Automatic Speech Recognition (ASR) results, which hitherto have been difficult for humans to understand, even in the absence of recognition errors. This is because the speech in a class lecture is relatively casual and contains many ill-formed utterances with filled pauses, restarts, and so on. Recently there has been e...
متن کاملAutomatic Comma Insertion of Lecture Transcripts Based on Multiple Annotations
To enhance readability and usability of speech recognition results, automatic punctuation is an essential process. In this paper, we address automatic comma prediction based on conditional random fields (CRF) using lexical, syntactic and pause information. Since there is large disagreement in comma insertion between humans, we model individual tendencies of punctuation using annotations given b...
متن کاملImproving the readability of class lecture ASR results using a confusion network
This paper presents a method for improving the readability of Automatic Speech Recognition (ASR) results for classroom lectures. Most of the previous research on improving the readability of recognition results focused mainly on manually transcribed texts, and not ASR results. Due to the presence of a large number of domain-dependent words and the casual presentation style, even state-of-the-ar...
متن کاملImproving the Readability of ASR Results for Lectures Using Multiple Hypotheses and Sentence-Level Knowledge
This paper presents a novel method for improving the readability of automatic speech recognition (ASR) results for classroom lectures. Because speech in a classroom is spontaneous and contains many ill-formed utterances with various disfluencies, the ASR result should be edited to improve the readability before presenting it to users, by applying some operations such as removing disfluencies, d...
متن کاملConstruction of linefeed insertion rules for lecture transcript and their evaluation
The development of a captioning system that supports the real-time understanding of monologue speech such as lectures and commentaries is required. In monologues, since a sentence tends to be long, each sentence is often displayed in multi lines on the screen. In the case, it is necessary to insert linefeeds into a text so that the text becomes easy to read. This paper proposes a rule-based tec...
متن کامل